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A.  Acoustic-Phonetic  Recogn  ition 

Much  of  this  past  quarter  was  spent  in  designing, 
implementing,  and  testing  the  interface  between  the 
Acoustic-Phonetic  Experiment  Facility  (APEF)  and  the 
Acoustic-Phonetic  Recognition  (APR)  program.  The  goal  of 
this  interface  is  to  allow  the  APR  to  correctly  adjust  the 
scores  for  each  phoneme  against  each  segment  in  the  segment 
lattice  according  to  the  particular  acoustic  feature  values 
found  within  that  segment. 

As  a test  of  the  effect  of  this  individual  adjustment 
to  phoneme  scores,  we  attempted  to  discriminate  amonq  the 
three  nasal  consonants.  The  APR  program  previously  used 
conventional  threshold  decisions  to  choose  one  of  several 
labels  for  each  segment.  The  scores  for  particular  phonemes 
were  determined  by  the  statistics  of  the  confusions  between 
these  segment  labels  and  the  correct  phonemes.  The  highest 
scoring  nasal  phoneme  was  correct  70-75%  of  the  time.  With 
the  non-par ametr ic  modeling  procedure,  which  uses 
information  from  the  AFEF,  the  first  choice  nasal  was 
correct  90%  of  the  time.  What  is  more  important  is  that  for 
those  segments  which  were  correct,  the  scores  on  the  other 
nasals  were  often  decreased  very  sharply.  For  those 
segments  where  the  first  choice  was  incorrect,  the  correct 
nasal  had  a score  near  the  top  scoring  nasal. 
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Speaker 


Normal ization 


During  the  past  quarter  we  also  discussed  several 
possible  speaker/recording  environment  normalization 
procedures.  Currently,.  the  APR  is  speaker  independent. 
That  is,  any  normalizing  parameters  used  are  derived 
directly  from  the  utterance  being  recognized  with  no  other 
knowledge  about  the  speaker.  While  these  techniaues  can 
perform  quite  well,  the  APR  could  be  somewhat  more  accurate 
if  there  were  some  outside  knowledge  about  this  particular 
speaker.  Some  of  the  normalization  procedures  we  discussed 
are  as  follows: 


1}  Using  a carefully  designed,  phonetically  balanced 
utterance,  one  could  extract  a small  number  of  useful 
acoustic  parameters  which  were  known  to  be  speaker 
dependent  and  not  easily  derivable  from  an  utterance  of 
unknown  phonetic  content  (e.g.,  average  fricative 
spectra) . These  could  then  be  used  as  thresholds  in 
the  APR  program. 

For  some  uses  of  a speech  understanding  system,  it 
would  be  worth  having  a trained  speech  technician 
extract  these  numbers.  It  would  also  be  possible  to 
design  a system  which  would  be  able  to  deal  with  the 
known  utterance,  and  automatically  extract  the  data. 

2)  Since  the  statistics  of  phoneme/segment  label 
confusions  in  some  way  reflect  the  particular  speaker 
characteristics,  one  could  imagine  weighting  the 
confusion  matrix  heavily  for  that  speaker.  This  would 
reauire  a large  amount  of  speaker  training  and  would 
only  be  useful  for  some  applications.  Of  course, 
average  statistics  could  always  be  used  until  the 
speaker's  identity  or  speaker  char  act.er  ist  ics  were 
determined . 


3)  An  extreme  case  of  speaker  training  would  involve 
deriving  the  acoustic  probability  distributions  that 
determine  segmentation  and  labeling  from  the  speech  of 
only  one  speaker,  instead  of  from  a mix  of  speakers. 
In  principle,  the  structure  of  the  algorithms 
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themselves  could  even  vary,  though  parametric  variation 
with  a fixed  structure  would  probably  be  more 
practical.  Those  acoustic  recognition  programs  that 
use  raw  spectral  matching  to  determine  phonetic  content 
often  need  this  type  of  speaker  tuning  [Dixon,  1976, 
p.  9;  Bakis,  1976  , p.  S97  ; Lowerre,  1976,  p.  S 9 7 ] . 

We  have  done  one  experiment  on  case  (2)  by  using 

confusion  matrix  statistics  which  were  heavily  weighted 

toward  utterances  by  the  speaker  of  the  utterance,  but  the 

results  are  so  far  inconclusive.  Offsetting  the  potential 

advantage  to  be  gained  from  single-speaker  training  was  a 

significant  reduction  in  the  size  of  the  available  set  of 

training  utterances  and  an  increased  risk  of  not  having 

important  phenomena  represented. 

B.  Lexica^  Retrieval 

Durinq  the  past  quarter,  Lexical  Retrieval's  scoring 
algorithm  was  modified  to  permit  a more  accurate  scoring  of 
alignments  that  involve  segmentation  errors.  When  aligning 
a phoneme  sequence  with  a segment  lattice,  the  Lexical 
Retrieval  component  permits  three  kinds  of  "incremental" 


al  ig 

nmen  ts 

[Klovstad  , 1976  ] : 

1) 

Match 

- an  alignment  of 

one 

phoneme  with 

one  segment. 

2) 

Spl  i t 

- an  alignment  of 

two 

consecutive 

phonemes  with 

one  segment. 

3) 

Merge 

an  alignment 

of 

one  phoneme  with  two 

consecutive  segments. 
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Previously,  only  Matches  were  scored  accurately. 
During  the  past  quarter,  additions  were  made  to  three 
distinct  parts  of  the  system  in  order  to  permit  a more 
accurate  scoring  of  the  Split  and  Merqe  alignments. 

First  we  found  that  in  order  to  calculate  Split 
probabilities,  statistics  on  the  frequency  of  consecutive 
phonemes  were  needed.  This  was  accomplished  by  an  extension 
of  the  statistics  gathering  package. 

Secondly,  we  needed  to  create  scoring  matrices  for  both 
Split  and  Merge  alignments.  Let  NS  be  the  number  of 
different  segments  (NS  = 83  in  the  current  system)  and  NP, 
the  number  of  different  phonemes  (NP  = 105  in  the  current 
system) . Then  the  number  of  possible  Split  alignments  is 
NS  *NP  *NP  (915,975)  and  the  number  of  possible  Merge 
alignments  is  NS*NS*NP  (723,345).  Since  our  data  base  is 
somewhat  limited  and  these  kinds  of  segmentation  errors 
occur  relatively  infrequently  (approximately  3 percent  of 
the  samples),  the  possibility  of  calculating  each  of  these 
probabilities  was  clearly  out  of  the  question.  We  wanted  to 
use  the  alignment  statistics  available  from  our  data  base 
and  restrict  ourselves  to  a more  manageable  problem.  Our 
solution  was  to  map  the  segments  and  phonemes  into  segment 
classes  and  phoneme  classes  respectively.  This  permitted 
the  creation  of  substantially  smaller  Split  and  Merge 
matrices  that  were  indexed  on  the  basis  of  these  classes. 
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This  smeared  the  statistics  somewhat  but  qreetly  reduced  the 
size  of  the  matrices.  The  program  that  had  previously 
produced  the  Match  scoring  matrix  was  extended  to  create 
these  two  additional  matrices. 

Thirdly,  the  search  algorithms  used  by  the  Lexical 
Retrieval  component  had  to  be  modified  to  use  the  new 
probabilistic  Split  and  Merge  scores.  Other  changes  such  as 
ones  to  the  alignment  programs  were  also  made  for  system 
compatibil itv. 

As  a r suit  of  using  these  new  probabilistic  scores,  we 
observed  a definite  improvement  in  the  overall  performance 
of  the  Lexical  Retrieval  component.  We  expect  further 
improvement  as  additional  sentences  become  available  as  part 
of  the  data  base  from  which  the  statistics  are  gathered 
since:  1)  better  estimates  of  the  current  "incremental" 
alignment  probabilities  will  be  possible,  and  2)  mappings  to 
more  classes  will  be  possible  for  the  determination  of  Split 
and  Merge  scoring  matrices. 

C.  i §£!I!aIltics 

1.  Grammar  for  Semantic  Interpretation 

This  quarter  the  grammar  was  extensively  modified  to 
build  semantic  interpretations  instead  of  syntactic  parse 
This  change  was  motivated  by  the  fact  that  with  our 

c 


trees . 
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grammar  encoding  semantic  as  well  as  syntactic  information, 
oroducinq  purely  syntactic  structures  meant 
information  was  beinq  thrown  away.  Such  information  would 
then  later  be  reintroduced  by  the  interpretation  rules.  By 
eliminatinq  the  middle  step,  we  not  only  speed  up  the  system 
but  reduce  its  size  by  an  entire  TENEX  fork. 

The  grammar  now  builds  interpretations  by  accumulating 
in  registers  the  semantic  head,  quantifier,  and  links  of  the 
nodes  being  described  in  the  sentence.  For  example,  the 

sentence 

"I  will  go  to  Chicago  for  the  ASA  meeting." 
yields  the  interpretation 

Te^HE  i F I NDQ^BB/MEETING^ SPONSOR0 ASA ) ) " ! T ; 

(RLJILD’  DB/TRIP  (DESTINATION  X) 

(BUILD.  UB/i  (TRAveLER  SPEAKER) 

(TO/ATTEND  Y) 

(TIME  (AFTER  NOW) ) ) ) 

This  interpretation  is  built  up  in  the  following  way.  The 
PUSH  arc  that  looks  for  a constituent  describing  a Personal 
the  start  of  the  sentence  will  transform  the  pronoun  I 
into  the  link-node  pair  (TRAVELER  SPEAKER)  and  return  this 
as  the  interpretation  of  that  constituent.  The  word  "will- 
adds  (TIME  (AFTER  NOW))  to  the  list  of  link-node  pairs  being 
accumulated.  (The  grammar  does  not  accept  constructions 
like  "Will  have  gone,"  so  "will"  can  currently  always  be 
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interpreted  as  marking  a future  event.)  The  word  "qo"  sets 
the  semantic  head  to  DB/TRIP.  The  constituent  "to  Chicago" 
parses  with  the  interpretation 

(DESTINATION  (!  THE  X LOCATION  ((CITY  CHICAGO)))). 

(The  ! indicates  that  a FOR:  expression  will  have  to  be 
built  as  part  of  the  interpretation.)  Similarly,  "the  ASA 
meetinq"  produces 

(TO/ATTEND  (!  THE  Y DB/MEETING  ((SPONSOR  ASA)))). 

The  top  level  of  the  grammar  has  thus  accumulated  the 
link-node  pairs 

(DESTINATION  (! )) 

(TIME  — ) 

(TRAVELER  SPEAKER) 

(TO/ATTEND  (! ))) 

with  the  semantic  head  DB/TRIP.  The  appropriate  action  (in 
this  case  a BUILD:)  is  created,  and  the  necessary 
quanti f icational  expressions  are  expanded  around  it. 


Currently, 

each 

level 

of  the  grammar 

produces  the 

regular  syntactic 

parse 

tree  and  pops 

the 

semantic 

interpretations 

that 

it  has 

built  in  parallel 

as  a 

feature . 

As  soon  as  the 

grammar  has 

been  thoroughly  checked 

out , the 

syntactic  registers  and  parse  trees  will  be  eliminated. 
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This  major  change  to  the  grammar  has  necessitated  a 
number  of  changes  to  the  semantic  network.  For  example,  the 
per  d_iem  associated  with  a city  had  been  represented  as  a 
property  (one-way  lie,'  of  the  city.  In  English,  however, 
the  per  oKsm  is  usually  referenced  bv  a noun  phrase  as  in 
"What  is  the  per  d ieir  for  Chicago?"  The  most  natural 
interpretation  results  in  a per  diem  structure,  which  itself 
has  properties  (e.g.,  a dollar  value)  and  relations  (e.g., 
an  associated  city).  We  are  now  restructuring  the  relevant 
parts  of  the  network  to  be  compatible  ’ith  the  resultant 
interpretations.  At  the  same  time  the  network  is  being 
enlarged  to  include  all  the  place  and  people  names  in 
BIGDICT.  There  aie  now  approximately  2400  nodes  in  the 
network . 

In  addition  to  network  changes,  there  are  also  a number 
of  changes  being  made  to  the  retrieval  functions  and  new 
METHODS  are  being  added  for  the  "fictitious  links"  that 
appear  in  interpretations  [Bruce  and  Harris,  1975]. 

2.  Parser 

Du  ing  the  past  guarter.  work  on  the  parser  has 
centered  on  fixing  bugs,  implc  -nenting  a facility  for 
handling  island  collision  e/ents,  and  designing 
niodif ications  to  increase  the  number  of  syntactic  events 
that  can  be  processed  (including  a garbage  collector  for 
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path  configurations  and  a facility  for  swapping  some  of  the 
data  arrays) . Features  have  also  been  added  to  provide  for 
the  computation  of  a syntactic  likelihood  score  by  actions 
on  the  arcs  of  the  grammar.  This  syntactic  likelihood  score 
will  be  up-  d initially  to  provide  score  adjustments  to 
events  as  a result  of  confirmation  or  disconf i rmation  of 
prosodic  hypotheses  made  by  arcs  of  the  grammar.  The 
implementation  of  this  facility  is  one  of  the  steps 
necessary  to  the  incorporation  of  the  UNIVAC  boundary 
detection  programs  into  the  system,  a step  that  we  hope  to 
be  able  to  try. 

In  addition,  the  pa~ser  has  been  modified  to  permit  the 
lifting  of  registers  from  one  level  to  the  next  as  features 
in  order  to  pass  along  semantic  interpretations. 

D.  verification 


Dur ing 

the 

past 

quarter  , we 

extended 

the 

scor ing 

mechan i sm 

of 

the 

Ver if ication 

component 

to 

provide 

log-likelihood  ra 

tios 

of  verified  word 

scor e s . 

To  d 

o this. 

spectral  distance  scores  (old  scores)  were  collected  from 
300  words  that  had  been  verified  by  the  speech  understanding 
system.  Of  these  300  words,  approximately  50  were  correct, 
"correct"  being  defined  as  having  verified  the  proper 
phonetic  spelling  over  the  appropriate  region  of  the 
utterance.  We  created  two  histograms  based  on  old  scores, 
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one  for  correct  words  only  and  the  other  for  all  words.  We 
then  modeled  each  of  the  the  histograms  according  to 
[Makhoul  and  Schwartz,  1975,  pp.  50-65]  in  order  to  provide 
smooth  continuous  probability  density  functions.  These  two 
models  were  entered  into  the  Verification  component.  We  now 
compute  the  old  score  as  before,  then  divide  the  probability 
of  that  score  for  correct  words  by  the  probability  of  that 
score  for  all  words.  We  take  the  log  of  this  ratio  which 
gives  us  the  log-likelihood  ratio  or  new  score.  The  control 
component  has  been  appropriately  modified  to  accept  this 
score  and  combine  it  with  the  log-likelihood  score  returned 
by  the  Lexical  Retrieval  component. 

A new  synthesis-by-rule  program  has  been  developed 
during  this  quarter,  differing  from  its  predecessor  in  that 
it  produces  synthesis  parameters  to  drive  a linear 
predictive  waveform  synthe sizer . This  was  done  in  order  to 
make  the  synthesis  output  more  compatible  with  the  error 
metric  used  in  the  verification  component.  At  the  quarter  s 
end,  a new  version  of  the  Verification  component  based  on 
this  synthesis  program  was  being  assembled. 
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E.  System  £12^  Control  Performance 

1 . 5££°3.nii:i°n  sEl£Ee3ie£ 

During  the  past  quarter,  we  continued  our  development 
of  shortfall  control  strategies  and  built  experimental 
versions  of  the  system  corresponding  to  each  of  three  of 
them.  In  addition,  we  implemented  an  initial  version  of  a 
new  control  strategy,  called  a "bounded-breadth  left-end" 
strategy,  resulting  in  a fourth  version  of  the  system. 
Performance  results  for  these  four  evolutionary  stages  of 
the  system  are  given  in  the  next  section. 

The  first  three  systems  differ  from  previous  ones 
primarily  in  their  control  strategy.  (These  changes  will  be 
described  briefly  below.)  The  Acoustic-Phonetic  Recognizer, 
Lexical  Matcher,  and  Dictionary  (i.e.,  phonological  rules) 
are  basically  unchanged,  although  part  way  through  the  June 
18  system  testing,  it  was  discovered  that  the  APR  confusion 
statistics  in  use  had  been  computed  using  an  algorithm  that 
was  thought  to  have  been  rejected  some  time  ago.  A change 
to  the  "correct"  confusion  statistics  was  made  instantly. 
With  respect  to  the  higher-level  components,  the  only 
differences  lie  in  small  changes  made  to  the  details  of  the 
grammar  . 
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In  the  first  of  these  three  systems,  referred  to  as  the 
"May  23  system,"  the  "credit"  heuristic  [Woods,  1976, 
pp.  138-141]  was  added  to  the  shortfall  density  method  of 
computing  priority  scores  on  events.  In  addition,  the  score 
of  a word-match  was  changed  to  take  into  account  the 
"pronunciation  likelihood"  score  of  the  given  pronunciation. 
This  is  derived  during  the  dictionary  expansion  phase  from 
likelihoods  of  phonological  rules  being  applied  during  the 
expansion  process. 

In  the  second  of  these  systems,  called  the  "June  6 
system,"  island  collision  events  were  added.  That  is,  as 
words  are  added  to  a theory,  checks  are  performed  to  see  if 
the  added  words  correspond  to  words  previously  added  to 
other  theories  in  the  opposite  direction.  For  each  such 
"collision,"  an  event  is  made  that  merges  the  two  events, 
and  it  takes  its  place  on  the  event  queue  with  a score 
appropriate  to  the  word  matches  in  the  combined  hypothesis. 
As  with  all  events,  the  syntactic  consistency  of  the  new 
event  is  not  checked  unless  and  until  the  event  becomes  the 
top  element  on  the  event  queue. 

Another  significant  change  was  in  the  "rectification" 
of  adjacent  word  matches  in  a theory  — i.e.,  the  rejection 
of  paths  that  use  incompatible  adjacent  word  matches. 
Formerly,  the  score  of  a series  of  adjacent  (fuzzy)  word 
matches  was  the  sum  of  the  best  individual  word  match  in 
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each  fuzzy,  regardless  of  whether  or  not  they  were 
consistent  with  each  other.  In  the  June  6 system,  only  word 
matches  with  common  boundaries  are  allowed  to  abut,  and  the 
scoring  of  adjacent  fuzzy  word  matches  involves  examining 
the  allowable  word  match  pairs  and  picking  the  consistent 
path  with  the  best  score.  Also,  priority  scoring  was 
computed  using  shortfall  density  alone,  with  neither  credit 
nor  liability. 


In  the  June  18  system,  this  concept  of  rectification 
was  extended  to  involve  not  just  boundary  consistency,  but 
full  phonological  consistency.  That  is,  control  is  now 
aware  of  which  member  of  a context  fuzzy  word  match  has  been 
an  anchor  for  each  new  word  match,  and  it  uses  this 
information  in  rectification.  This  brings  to  bear  the  full 
effect  of  the  word-boundary  phonological  rules  employed  in 
the  Lexical  Matcher.  In  addition,  several  bugs  were  fixed 
in  the  computation  of  shortfall  and  credit  .scores,  and 
priority  scores  are  once  again  computed  as  shortfall  density 
plus  credit.  However,  a major  bug  remains  in  the  June  18 
system  affecting  the  scoring  of  seed  words  with  phonological 
word  boundary  effects.  This  should  be  fixed  in  a later 
version . 

One  of  the  problems  with  the  shortfall  control 
strategies  is  the  1.  rge  number  of  events  that  must  usually 
be  processed  in  order  for  one  theory  to  grow  large  enough  to 
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span  the  entire  utterance.  Several  seeds  are  effectively 
started  in  parallel,  and  their  successive  generations  of 
descendents  grow  rapidly.  The  usual  mode  of  failure  of  our 
shortfall  systems  was  for  the  system  to  run  out  of  space  in 
either  the  control  or  syntax  fork,  or  for  it  to  hit  an 
arbitrarily  imposed  2 cpu-hour  time-out.  By  this  time,  the 
system  would  usually  have  processed  between  60  and  100 
events . 

For  these  reasons,  we  have  also  implemented  a different 
type  of  control  strategy,  which  we  have  dubbed  a 
"bounded-breadth  left-end"  strategy.  In  essence,  the 
procedure  is  as  follows: 


1.  Scan  for  possible  utterance-initial  words  at  the 
left  end  of  the  utterance.  Form  an  initial  event 
queue  from  the  resulting  seed  events. 


2.  Order  the  event  queue  by  event  score  (word  match 
quality) , then  discard  all  but  the  best  N events. 

3.  If  all  events  span  the  utterance,  go  to  step  3b. 

3a.  Select  for  syntactic  processing  the  event 
whose  duration  is  the  shortest,  but  do  not  select 
any  event  that  spans  the  utterance. 

3b.  Select  the  top  (best  scoring)  event. 

4.  Give  that  event  to  Syntax  for  syntactic 
ver  if ication . 

4a.  If  the  event  spans  and  is  linguistically 
well-formed  and  complete,  declare  that  to  be  the 
interpretation . 

4b.  If  the  event  is  rejected  as  ill-formed,  go 
to  step  2. 
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4c.  If  Syntax  proposes  words  and  classes  that 
might  occur  to  the  right,  give  the  proposals  to 
Lexical  Retrieval.  Form  new  events  from  any  word 
matches  that  come  back  and  add  them  to  the  event 
queue.  Go  to  step  2. 


This  strategy  effectively  amounts  to  starting  off  N events 
at  the  left  end  of  the  utterance  and  forcing  the  best  N (or 
fewer)  events  at  each  point  all  the  way  to  the  right  end. 
Once  all  events  hit  a possible  right  end  boundary,  the 
highest  scoring  syntactically  acceptable  event  (if  any)  is 
declared  the  winner. 


This  remarkably  simple  control  strategy  has  an  upper 
bound  on  the  number  of  events  to  be  processed,  on  the  order 
of  N times  the  number  of  words  in  the  utterance.  This  is 
clearly  linear  with  respect  to  the  length  of  the  utterance, 
not  exponential,  as  in  the  island-driven  strategies.  Also, 
the  partial  interpretations  of  the  utterance  are  anchored  to 
the  left  end,  which  provides  rather  tighter  syntactic 
constraints  at  each  step  than  is  the  case  with  the 
middle-out  strategies.  The  disadvantages,  of  course,  are 
that  the  system  must  find  the  leftmost  word  in  the  initial 
scan,  and  it  has  no  more  than  one  chance  at  each  choice 
point  to  find  each  successive  correct  word,  and  furthermore, 
to  find  it  with  a sufficiently  good  score  for  it  to  be  kept 
on  the  queue  of  maximum  length  N. 
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Such  a "bounded-breadth  left-end"  control  strategy  was 
made  an  option  in  the  "July  22  system."  Given  the  way  in 
which  HWIM's  control  component  is  implemented,  the  inclusion 
of  this  strategy  required  the  addition  of  only  a handful  of 
functions  and  the  setting  of  several  existing  option  flags. 
With  the  maximum  queue  length  N set  to  8,  and  utterances  of 
3 to  9 words  each,  the  system  only  rarely  runs  out  of 
resources  before  terminating.  The  July  22  system  also 
included  addition  to  the  Lexical  Retrieval  component  of 
probabilistic  split  and  merge  scoring,  as  described  in 
Section  B. 

2.  System  Performance 


We  have  been 

testing 

system 

performance  on  several 

sets 

of  utterances: 

(a) 

the 

three 

sets 

of 

20  utterances 

each 

selected  by  SCRL, 

and 

de 

signated 

by 

us 

as  the  "Mar 

ch"  , 

"April",  and  "May",  sets;  (b)  six  of  the  "May"  utterances, 
re-recorded  in  a very  quiet  room,  designated  the  "June"  set. 
These  were  re-recorded  to  test  the  hypothesis  that  the 
higher  noise  level  in  our  new  laboratory  is  detrimental  to 
the  operation  of  the  acoustic-phonetic  recognition;  (c)  We 
also  have  been  using  two  other  sets  of  10  utterances  each, 
from  our  collection  of  on-line  utterances  dating  before 
February  1976.  Since  some  of  these  utterances  were  used  to 
tune  the  APR,  we  do  not  regard  results  obtained  with  them  as 
being  indicative  of  system  performance  on  new  utterances. 
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"Control  set  #1"  is  made  up  of  10  utterances  on  which  the 
March-vintage  systems  succeeded;  "control  set  #2"  is  made  up 
of  utterances  on  which  the  March-vintage  systems  failed.  So 
the  first  control  set  represents  "good"  utterances  on  which 
we  should  expect  to  continue  to  succeed;  the  second 
represents  utterances  on  which  new  successes  are  sought. 

The  utterance-successes  for  the  four  versions  of  the 
system  are  summarized  below,  where  the  four  dates  heading 
the  columns  represent  the  versions  of  the  system  described 
in  the  previous  section. 


May  23 

June  6 

June  18 

July 

March 

— 

— 

3/20 

7/20 

Apr  il 

— 

— 

2/20 

4/20 

May 

1/20 

1/20 

1/20 

5/20 

June 

— 

— 

1/6 

0/6 

TOTAL  (M-J) 

7/66=11% 

16/66 

C.S. #1 

8/10 

— 

7/10 

8/10 

C.S. #2 

0/10 

— 

0/10 

1/10 
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Abstract 

The  BBN  Speech  Understanding  System  (dubbed  HWIM,  or 
Hear  What  I Mean)  contains  knowledge  sources  at  the  levels 
of  acoustic-phonetics,  phonology,  vocabulary,  syntax, 
semantics,  factual  knowledge,  and  discourse.  This  paper 
describes  how  these  knowledge  sources  are  realized  in  the 
nine  functional  components  of  the  system.  It  also  discusses 
the  control  strategy  for  generating,  evaluating,  and 
extending  hypotheses  into  a complete  understanding  of  the 
spoken  utterance. 


Nash-Webber , B.L.,  "Semantic  Interpretation  Revisited,"  BBN 
Report  No.  3335,  Bolt  Beranek  and  Newman  Inc.,  Cambridge, 
MA.  Also,  presented  at  COLING-76,  Ottawa,  Canada,  28  June-2 
July  1976. 

Abstract 

A brief  overview  is  given  of  the  BBN  LUNAR  system. 
This  is  followed  by  a discussion  of  two  of  its  deficiencies: 
the  simple  "enumerate  and  test"  processing  of  quantified 
expressions  in  its  meaning  representation  language  and  its 
inadequate  treatment  of  anaphora.  We  then  present  a rough 
classification  of  anaphoric  expressions  as  groundwork 
towards  formulating  a general  computational  treatment  of  the 
phenomenon.  In  this  classification,  we  establish  a 
distinction  between  denotational  anaphora  - references  to 
previously  mentioned  objects,  sets,  events,  states,  etc. 
and  descr iptional  anaphora  - references  to  previous 

descriptions.  Finally,  we  present  an  initial  sketch  of  both 
a formal  meaning  representation  language  and  some  procedures 
seen  needed  for  manipulating  sentences  of  that  language, 
which  may  provide  a handle  on  some  aspects  of  anaphor 
resol ution . 


Nash-Webber,  B.L.  and  B.  Bruce,  "Evolving  Uses  of  Knowledge 
in  a Speech  Understanding  System,"  presented  at  COLING-76, 
Ottawa,  Canada,  28  June-2  July  1976. 

Bruce  B. , "Discourse  Influences  on  Language  Generation," 
presented  at  COLING-76,  Ottawa,  Canada,  28  June-2  July  1976. 
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